Enzyme promiscuity prediction using hierarchy-informed multi-label classification
نویسندگان
چکیده
Abstract Motivation As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present evaluate several machine-learning models to predict which 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, likely interact with a given query molecule. Our data consists enzyme-substrate interactions from BRENDA database. Some attributed natural selection involve enzyme’s substrates. The majority however non-natural substrates, thus reflecting promiscuous enzymatic activities. Results frame this ‘enzyme promiscuity prediction’ problem multi-label classification task. maximally utilize inhibitor unlabeled train prediction that can take advantage known hierarchical relationships between classes. report neural network, EPP-HMCNF, best model for solving problem, outperforming k-nearest neighbors similarity-based other models. show information during training consistently improves predictive power, particularly EPP-HMCNF. also all perform worse under realistic split when compared random split, evaluating performance on substrates Availability implementation provide Python code EPP-HMCNF in repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP. Supplementary available Bioinformatics online.
منابع مشابه
Air pollution prediction via multi-label classification
A Bayesian network classifier can be used to estimate the probability of an air pollutant overcoming a certain threshold. Yet multiple predictions are typically required regarding variables which are stochastically dependent, such as ozone measured in multiple stations or assessed according to by different indicators. The common practice (independent approach) is to devise an independent classi...
متن کاملMolecular signatures-based prediction of enzyme promiscuity
MOTIVATION Enzyme promiscuity, a property with practical applications in biotechnology and synthetic biology, has been related to the evolvability of enzymes. At the molecular level, several structural mechanisms have been linked to enzyme promiscuity in enzyme families. However, it is at present unclear to what extent these observations can be generalized. Here, we introduce for the first time...
متن کاملMulti-Label Informed Feature Selection
Multi-label learning has been extensively studied in the area of bioinformatics, information retrieval, multimedia annotation, etc. In multi-label learning, each instance is associated with multiple interdependent class labels, the label information can be noisy and incomplete. In addition, multi-labeled data often has high-dimensional noisy, irrelevant and redundant features. As an effective d...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2021
ISSN: ['1367-4811', '1367-4803']
DOI: https://doi.org/10.1093/bioinformatics/btab054